Skip to content

MF3-H02: lock home channel before reading status in event handlers#820

Merged
philanton merged 3 commits into
fix/audit-findings-finalx3from
fix/mf3-h02
Jun 8, 2026
Merged

MF3-H02: lock home channel before reading status in event handlers#820
philanton merged 3 commits into
fix/audit-findings-finalx3from
fix/mf3-h02

Conversation

@philanton

Copy link
Copy Markdown
Contributor

Finding

Home-channel event handlers read the channel via GetChannelByID before acquiring LockUserState, then mutated/persisted that pre-lock snapshot. LockUserState locks the user balance row, not the channel row — so a concurrent submit_state Finalize could flip the channel Open→Closing in the read→lock window.

HandleHomeChannelCheckpointed's non-challenged path persists the channel snapshot verbatim (it only reassigns Status on the Challenged branch). A stale Open snapshot therefore overwrote the committed Closing, reopening a finalized channel — bypassing the C-01 remediation and enabling the post-finalize epoch-rebind / fake-HomeDeposit spend chain.

Fix

New DBStore.LockUserStateForHomeChannel(channelID):

  • Derives the (wallet, asset) lock key from the channel inside SQL — no pre-lock Go snapshot exists.
  • Postgres: ensures the balance row, then SELECT c.* FROM channels c JOIN user_balances b ... WHERE c.channel_id = ? FOR UPDATE OF b — locks the balance row (same row/strength as LockUserState) and reads the channel in one statement. The stale-snapshot window is structurally impossible, not just guarded.
  • sqlite (tests): resolve → LockUserState → read.

All four home-channel handlers (Created, Checkpointed, Challenged, Closed) now lock-then-read via this method. Escrow handlers are unchanged (different status surface, out of scope).

Wired into core.ChannelHubEventHandlerStore, evm.ChannelHubReactorStore, database.DatabaseStore + both mocks.

Tests

  • Rewired 25 home-handler tests to the new mock; escrow tests untouched.
  • TestHandleHomeChannelCheckpointed_DoesNotReopenFinalizedChannel — post-lock Closing snapshot must persist Closing, never reopen to Open.
  • TestDBStore_LockUserStateForHomeChannel — returns channel + ensures balance row; nil for missing channel.

go build ./..., go vet, and go test (event_handlers, evm, store/database) pass.

Note: the Postgres branch (FOR UPDATE OF on the join) is covered by the gated store/database/test integration suite, not the default run — same convention as the existing LockUserState Postgres branch.

🤖 Generated with Claude Code

…lers

MF3-H02: home-channel event handlers read the channel via GetChannelByID
before acquiring LockUserState, then acted on that pre-lock snapshot.
LockUserState locks the user balance row, not the channel row, so a
concurrent submit_state Finalize could flip the channel Open->Closing in
the read->lock window. HandleHomeChannelCheckpointed's non-challenged path
persists the snapshot verbatim, so the stale Open clobbered the committed
Closing and reopened a finalized channel.

Add DBStore.LockUserStateForHomeChannel: it derives the (wallet, asset)
lock key from the channel inside SQL and, on Postgres, locks the balance
row and reads the channel in one statement (SELECT ... JOIN ... FOR UPDATE
OF b), so no stale pre-lock snapshot can exist. All four home-channel
handlers (Created, Checkpointed, Challenged, Closed) now lock-then-read via
this method. Escrow handlers are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77daab08-cb23-47d0-bed4-b769a4762ff6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mf3-h02

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ihsraham ihsraham left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the handler-side change is moving in the right direction: the home-channel event handlers now go through one helper before mutating status, and the production reactor path is transaction-wrapped.

I am requesting changes on one blocker because the Postgres helper still does not guarantee a post-lock channel refresh, so H-02 is not fully closed yet. I left the specific note inline.

I am not treating the current SDK/MCP audit failure as part of this review, since that dependency fix is being handled separately and should be pulled in before merge.

Comment thread nitronode/store/database/db_store.go Outdated
@philanton philanton changed the base branch from main to fix/audit-findings-finalx3 June 5, 2026 13:41
LockUserStateForHomeChannel previously locked the balance row and read the
channel in a single `SELECT c.* ... FOR UPDATE OF b`. In READ COMMITTED,
`FOR UPDATE OF b` only re-evaluates the locked user_balances row after the
lock wait; the joined channels row is served from the statement-start
snapshot. A checkpoint handler could begin with status=Open, block on the
balance lock while a concurrent Finalize committed status=Closing, then
return the stale Open channel and persist it.

Acquire the (wallet, asset) balance lock first, then read the channel in a
separate statement so it takes a fresh snapshot reflecting any transition
committed during the wait.

Add a Postgres concurrency regression that holds the balance lock, flips
status to Closing, and commits while the call is blocked; it asserts the
returned status is Closing (skips on non-postgres).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@ihsraham ihsraham left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for H-02 closure. The new commit fixes the stale snapshot issue by acquiring the (wallet, asset) balance lock first and reading the channel in a separate statement after the lock is held. The added Postgres regression covers the wait-and-refresh case Anton described.

Non-blocking follow-up: that regression only runs with TEST_DB_DRIVER=postgres, so a dedicated pg CI lane would make this easier to keep covered. I am not treating that as a blocker for this fix.

Comment thread nitronode/event_handlers/service.go Outdated
Comment on lines +60 to +67
// Acquire the user's balance-row lock and read the channel under it before mutating
// channel status or backfilling the off-chain head. Receiver/release issuance paths
// lock the same row up front and then re-check Status via CheckActiveChannel; without
// this lock an in-flight RPC can read Status=Void, decide to store an unsigned receiver
// row, and commit before we flip to Open — leaving the latest head unsigned on a
// technically opened channel. The lock+read is a single store call so the channel
// snapshot is consistent with the lock. See HandleHomeChannelCheckpointed and
// HandleHomeChannelClosed for the same pattern.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs shortening. Specifying that "user's balance-row lock guards against race-conditions of channel status being changed in the meantime" should be enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, trimmed it down in 7f6d275 — now just says the lock guards against a concurrent submit_state flipping status between the read and our write, plus the see-also pointer.

@nksazonov nksazonov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation of the lock-before-read pattern across all four home-channel event handlers. The new regression test TestHandleHomeChannelCheckpointed_DoesNotReopenFinalizedChannel directly pins the race described in the audit finding.

Comment thread nitronode/store/database/db_store.go
Comment thread nitronode/store/database/channel_test.go Outdated
Comment thread pkg/core/interface.go Outdated
- db_store: document READ COMMITTED isolation prerequisite for the
  post-lock channel refresh (negated under REPEATABLE READ/SERIALIZABLE)
- core/store interface: narrow LockUserStateForHomeChannel doc — channel
  is read under the lock on postgres; pre-lock snapshot on sqlite (tests)
- channel_test: replace fixed time.Sleep with a deterministic pg_locks
  poll so the wait-and-refresh race is actually contended, not a no-op
- service: shorten HandleHomeChannelCreated lock comment

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philanton philanton merged commit 2e4d054 into fix/audit-findings-finalx3 Jun 8, 2026
7 checks passed
@philanton philanton deleted the fix/mf3-h02 branch June 8, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants